生成对抗网络(GAN)的适应旨在将预训练的GAN转移到具有有限培训数据的给定领域。在本文中,我们专注于单次案例,这在以前的作品中更具挑战性,很少探索。我们认为,从源域到目标域的适应性可以分为两个部分:全球样式(如纹理和颜色)的转移,以及不属于源域的新实体的出现。虽然先前的作品主要关注样式转移,但我们提出了一个新颖而简洁的框架\ footNote {\ url {https://github.com/thevoidname/generalized-onerized-one-one-shot-gan-adaption}},以解决\ textit {对样式和实体传输的一般性单发适应性}任务,其中提供了参考图像及其二进制实体掩码。我们的核心目标是通过切成薄片的瓦斯坦距离来限制参考文献和合成的内部分布之间的差距。为了更好地实现这一目标,首先使用样式固定来大致获得模范样式,并将辅助网络引入原始生成器以删除实体和样式传输。此外,为了实现跨域的对应关系,我们提出了变异的拉普拉斯正则化以限制适应性发生器的平滑度。定量和定性实验都证明了我们方法在各种情况下的有效性。
translated by 谷歌翻译
在本文中,我们专注于分析和改进视觉变压器自我发项层的辍学技术,这很重要,同时令人惊讶地被先前的作品忽略了。特别是,我们对三个核心问题进行研究:首先,自我发挥层的下降是什么?不同于文献中的注意力重量不同,我们建议在注意矩阵计算之前向前移动辍学操作,并将钥匙设置为辍学单元,从而产生一种新颖的辍学效果。从理论上讲,我们验证了该方案是否有助于保持注意力重量的正则化和概率特征,从而减轻了过度拟合问题的特定模式,并增强了模型以捕获重要信息;第二,如何在连续层中安排下降比?与利用所有层的恒定下降比相反,我们提出了新的减少时间表,该计划逐渐降低了沿自我注意力层的堆叠比率。我们通过实验验证提出的时间表可以避免在低水平特征中过度贴合,并且在高级语义中缺失,从而提高了模型训练的稳健性和稳定性;第三,是否需要执行结构化辍学操作为CNN?我们尝试基于补丁的辍学操作区块,发现CNN的这种有用的技巧对于VIT并不是必需的。考虑到以上三个问题的探索,我们提出了一种新颖的Dropkey方法,该方法将密钥视为下降单元和利用下降比的减少时间表,以一般方式改善VIT。全面的实验证明了Dropkey对各种VIT体系结构的有效性,\ Emph {e.g。} T2T和Volo以及各种视觉任务,\ Emph {e.g。},图像分类,对象检测,人类对象相互作用和人体形状检测和人体形状恢复。代码将在接受后发布。
translated by 谷歌翻译
从单个样本产生图像,作为图像合成的新发展分支,引起了广泛的关注。在本文中,我们将该问题与单个图像的条件分布进行采样,提出了一种分层框架,通过关于结构,语义和纹理的分布的连续学习来简化复杂条件分布的学习学习和一代可理解。在此基础上,我们设计由三个级联的GAN组成的Exsingan,用于从给定的图像学习可解释的生成模型,级联的GANS连续模拟结构,语义和纹理的分布。由于以前的作品所做的,但也是从给定图像的内部补丁来学习的,而且来自GaN反演技术的外部获得的外部。与先前作品相比,Exsingan对内部和外部信息的适当组合有利于内部和外部信息的适当组合,对图像操纵任务进行了更强大的生成和竞争泛化能力。
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
It has been observed in practice that applying pruning-at-initialization methods to neural networks and training the sparsified networks can not only retain the testing performance of the original dense models, but also sometimes even slightly boost the generalization performance. Theoretical understanding for such experimental observations are yet to be developed. This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. Specifically, this work considers a classification task for overparameterized two-layer neural networks, where the network is randomly pruned according to different rates at the initialization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero and the network exhibits good generalization performance. More surprisingly, the generalization bound gets better as the pruning fraction gets larger. To complement this positive result, this work further shows a negative result: there exists a large pruning fraction such that while gradient descent is still able to drive the training loss toward zero (by memorizing noise), the generalization performance is no better than random guessing. This further suggests that pruning can change the feature learning process, which leads to the performance drop of the pruned neural network. Up to our knowledge, this is the \textbf{first} generalization result for pruned neural networks, suggesting that pruning can improve the neural network's generalization.
translated by 谷歌翻译
Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.
translated by 谷歌翻译
As an important variant of entity alignment (EA), multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) with multiple modalities like images. However, current MMEA algorithms all adopt KG-level modality fusion strategies but ignore modality differences among individual entities, hurting the robustness to potential noise involved in modalities (e.g., unidentifiable images and relations). In this paper we present MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid, to dynamically predict the mutual correlation coefficients among modalities for instance-level feature fusion. A modal-aware hard entity replay strategy is also proposed for addressing vague entity details. Extensive experimental results show that our model not only achieves SOTA performance on multiple training scenarios including supervised, unsupervised, iterative, and low resource, but also has limited parameters, optimistic speed, and good interpretability. Our code will be available soon.
translated by 谷歌翻译
The task of video prediction and generation is known to be notoriously difficult, with the research in this area largely limited to short-term predictions. Though plagued with noise and stochasticity, videos consist of features that are organised in a spatiotemporal hierarchy, different features possessing different temporal dynamics. In this paper, we introduce Dynamic Latent Hierarchy (DLH) -- a deep hierarchical latent model that represents videos as a hierarchy of latent states that evolve over separate and fluid timescales. Each latent state is a mixture distribution with two components, representing the immediate past and the predicted future, causing the model to learn transitions only between sufficiently dissimilar states, while clustering temporally persistent states closer together. Using this unique property, DLH naturally discovers the spatiotemporal structure of a dataset and learns disentangled representations across its hierarchy. We hypothesise that this simplifies the task of modeling temporal dynamics of a video, improves the learning of long-term dependencies, and reduces error accumulation. As evidence, we demonstrate that DLH outperforms state-of-the-art benchmarks in video prediction, is able to better represent stochasticity, as well as to dynamically adjust its hierarchical and temporal structure. Our paper shows, among other things, how progress in representation learning can translate into progress in prediction tasks.
translated by 谷歌翻译
Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.
translated by 谷歌翻译
Gradient-based explanation is the cornerstone of explainable deep networks, but it has been shown to be vulnerable to adversarial attacks. However, existing works measure the explanation robustness based on $\ell_p$-norm, which can be counter-intuitive to humans, who only pay attention to the top few salient features. We propose explanation ranking thickness as a more suitable explanation robustness metric. We then present a new practical adversarial attacking goal for manipulating explanation rankings. To mitigate the ranking-based attacks while maintaining computational feasibility, we derive surrogate bounds of the thickness that involve expensive sampling and integration. We use a multi-objective approach to analyze the convergence of a gradient-based attack to confirm that the explanation robustness can be measured by the thickness metric. We conduct experiments on various network architectures and diverse datasets to prove the superiority of the proposed methods, while the widely accepted Hessian-based curvature smoothing approaches are not as robust as our method.
translated by 谷歌翻译